External factors
Leave a Reply
Forecasting
Forecasting
Here are some facts about forecasting. The first two are true for just about every business bigger than a kid’s curbside lemonade stand.
1. Everything in business is driven by “the forecast.”
2. Forecasts are almost always wrong.
3. Forecasts are either quantitative or qualitative. Quantitative forecasts are causal, explanatory, or time series. Qualitative forecasts are based on judgment and expert opinion. Both quantitative and qualitative forecasts are numerical; it is the method – not the result – that warrants the name.
Causal: Explains why the variable of interest changes.
Explanatory: Explains that the variable of interest changes when one or more other variables change, and by how much, but not why.
Time Series: Doesn’t explain anything; it just says “If there have been no major changes in the business environment lately and if you don’t expect any major changes in the near future, this is what’s coming. More or less.”
4. Forecasts are either aggregate or itemized.
Aggregate forecasts tend to be more accurate, because errors tend to cancel each other out. Suppose we accurately forecast enrollment for UB for next Fall. The CFO will praise us, because we got the tuition revenue exactly right. But suppose we have twice as many engineering students show up as we expected, and half as many liberal arts students. Our planning, staffing and scheduling is now worthless.
Itemized forecasts can be itemized at a variety of levels. Continuing the UB example, we can forecast enrollment for, say, the School of Business, the Accounting Department, or for Accounting 101. The more finely we focus our attention, the less accurate the forecast is likely to be, because we’re dealing with smaller and smaller samples.
5. Limitations of forecasts.
Some things just can’t be forecasted, at least not with anything even close to useful accuracy. Natural phenomena, like earthquakes and hurricanes, are simply not well enough understood to forecast.Economic phenomena, like booms and busts, or even ordinary economic indicators like the DJIA or unemployment, are notoriously resistant to forecasting techniques.
6. The Forecast “Horizon.”
One of the distinguishing characteristics of a forecast is its horizon – the farther the horizon extends into the future, the less accurate the forecast. That means “the farther you go beyond the region where you have data, the less accurate the forecast will be.”
Local weather forecasters are pretty good at forecasting 4 or 5 days out, but anything more than two weeks into the future is pretty much a guess. The same is true for business forecasts. Short horizons, say, up to a year, can be quite good, but any forecast beyond that is a crap shoot. It is also necessary to know the forecast error. How much “wiggle room” is included? Saying that next summer will be warmer than winter isn’t terribly helpful.
Time Series Forecasting (TSF):
Basic Assumptions:
1. The future will be an extension of the past. Everything that’s been happening will continue to happen, at least across the forecast horizon. (If this assumption isn’t warranted, then this technique is worthless.)
2. All the causes of variation in the variable of interest are imbedded in the data. The causes of variation, whatever they are, caused the variable of interest to do what it did. Therefore we don’t need to know what those causes are, just that they will continue as they have in the past.
TSF compared with other methods:
Multiple Regression (MR): You choose the variables on which you think the variable of interest depends. You form a linear relationship among the variables and the computer determines the coefficients, like this:
y = ax1 + bx2 + … + kxn + C
Here, y is the variable of interest, say, next month’s sales, and the x’s represent what we believe to be dependent variables, say, the amount of money we spend on advertising, current inventory levels, and so on. Plug in the values of the dependent variables, choose the coefficients, and out comes next month’s sales – in theory.
Assumptions of MR:
1. The variable of interest, y, is dependent on the xi’s.
2. The variable of interest depends on all of them and does not depend on anything else.
3. Although the usual formulation requires that the variable of interest be linearly dependent on each of the independent variables, non-linear relationships can be included.
Problems:
1. If the assumptions are wrong, the forecast will be less valid.
2. If you forgot an important variable, the forecast is less valid and less accurate.
Advantages of TSF viz. MR:
1. You cannot leave out an important variable, or include an unimportant variable.
2. In terms of implementation, Excel’s Regression tool is static. That means you have to re-run it each time you make a change. (We will generate the important outputs of Excel’s built-in tool so that they’ll respond immediately to any changes.)
Disadvantages of TSF viz MR:
1. You gain no insight into the causes of variation.
2. Occasionally, summed periodic dependencies appear to be aperiodic. In ordinary American this means that if your sales are subject to seasonal variation as well as longer term business cycles, the combination of both effects may mask any apparent regularity and appear to be basically random.
Moving Averages (MA):
1. Moving Averages are simple, but they do not account for periodic variation and they are lagging indicators.
2. They are only valid for “stationary” series (i.e., no trend, no cycles). Note: This is what books say. Moving averages will follow consistent trends very nicely, and depending on the period you choose, can “see through” cyclical or seasonal data.
3. MA places equal weight on all data values, regardless of how long ago they occurred, but there are variations on this technique (see next section) that preferentially emphasize more recent data and can account for cyclical variation. But this technique is still a lagging indicator.
Exponential Smoothing (ES):
Exponential smoothing is like a moving average but it places more weight on more recent values and less and less weight on values from the past. If F(t) is the function value for time period t, A(t) is the actual value for time period t, then the forecast f(t+1) for the next period is given by f(t+1) = F(t) = ?A(t) + (1 – ?)F(t – 1) where ? is a constant between 0 and 1.
1. ES is simple, but depends on your choice of ?, and is a lagging indicator.
2. ES is valid only for stationary trends.
3. ES can be extended to account for linear trends. When it is, it’s called “Holt’s Method.” If you’re interested, you can Google this.
4. ES can be extended to account for linear trends and seasonal variation. When it is, it’s called “Winter’s Method.” You can also Google this.
5. TSF does all this (and more) at least as easily. Further, TSF can relatively easily account for non-linear trends and multiple cycles “at no additional cost.”
Time Series Forecasting (TSF):
A “time series” is a set of equally spaced observations of the variable of interest made over time.In general, the method that follows does the following.
First we will “clean up” the data by removing outliers, if any. Then we will compute a non-linear trend line and extend that line into the future across the forecast horizon. Next we’ll calculate any seasonal effects and add them to the trend.
This method works better than any other for non-random time series. Unfortunately, most economic indicators (especially the stock market) are pretty much random. Fortunately, most business data (like sales, revenue, costs) are much less random.
TSF – The Details:
1. Plot the data, y(t) vs. t. What does it look like? The variable of interest (hereafter, just “y(t)”) changes as time goes by. If y(t) exhibits a pattern, it is an ideal candidate for TSF. The pattern can be complex, perhaps including both seasonal and cyclical variations on top of variable rates of rising and declining, but is nevertheless a pattern.
If there is no detectable pattern, and the time series is essentially random, only the most general kind of forecast is possible. These tend to be much less helpful.
Figure 1. Sample Monthly Sales Data
Our example data looks like what’s shown in Figure 1: 48 months of sales data, and our job is to forecast next year’s sales, by month. When we look at this data, it looks a little like a seismograph recording. There is a general trend of rising sales, but at first glance, there’s little else that seems at all helpful and forecasting next year’s monthly sales seems impossible.
But if we look closer, there seem to be some peaks and valleys that might occur regularly. Using Excel to identify which months have the highest sales, and our eyes to look at the actual data, we notice that the 6th month and the 18th month have noticeable peaks, and that the 9th month and the 21st month have noticeable valleys, or troughs. Then we notice that the peaks are 12 months apart, as are the troughs. Further examination shows that this pattern repeats across all 48 months. Oh, goody! That’s lucky for us, as we’ll soon enough see.
2. Remove outliers, if any. Outliers are the values of y(t) that represent rare, non-repeating occurrences. Statistically, outliers are points more than ?3? from the mean. A better method for recognizing outliers is to remove points that are more than ?3? from the trend-line.
A word of warning is necessary here. The computer will find all the points that are outside the standard statistical requirement for being an outlier. These points may or may not actually be outliers and you will have to use your judgment. An example is jewelry sales in the U.S. Jewelry sales are fairly constant throughout the year but peak dramatically in December. I’ve seen data for jewelry sales and the computer invariably says that all the December sales are outliers. Statistically, they are outliers, but that’s the nature of the business. The key phrase for determining which points are outliers is “rare, non-repeating” points.
Excel formula for identifying outliers: Name the range containing the data, “Data”. Then, in some cell above that range, type =AVERAGE(Data). Name that cell “Avg”. In the cell just below the average, type =STDEV.S(Data). Name that cell “StdDev”. In the column next to the data, type =IF(ABS(Data-Avg)>3*StdDev,“OUTLIER”,“”). Pop quiz: Why do we need the absolute value (ABS) function?
A better formula (trend-based outliers):
i. In column A, number the data values as 1,2,3, etc. Name that range “PtNbr” or “Period”.
ii. With the chart selected:Chart Tools ? Design ? Add Chart Element?Trendline ? More Trendline Options ?“Linear” and “Display Formula”) ?Close.(LClick on formula and drag it to where you can see it.) [You can also LClick on the “+” sign at the top right corner of the graph and go directly to “Trendline.”]
iii. Copy formula into col C, using data point numbers in col A as “x” values as follows:
=2.54*PtNbr+643.1 (for example) and copy down. Name this range “LinearTrend”. Of course, you will use the real coefficients based on your data.Pop Quiz: What do the constants represent, in business terms? The answer is NOT “slope” and “intercept”! While that’s correct for your algebra class, we need to understand what the numbers mean in business terms.
iv. Create column D as the difference between the linear formula and the data values.
=Data-LinearTrend and copy down. Name this range “Diff”.
v. Find the standard deviation of the differences and check them for outliers.Pop quiz: Why is this better?
Outliers are replaced as follows: yt = ½(yt-1 + yt+1). In the rare case where there are two outliers together, yo1 = 2/3ya + 1/3 yb and yo2=1/3ya + 2/3 yb. These formulas are what you will find in reference books. It is an extremely marketable skill to be able to translate formulas from books into Excel. The key to doing this is to understand what each symbol represents and where those values are on the spreadsheet. In this case, yt represents the outlier value at time t, and yt-1 and yt+1 represent the data values just before and just after the outlier value.
There is a caveat here – if yt is at a peak (or trough) of the time series, you need to use your judgment.
3. Replace missing values. Use the same method as for removing outliers.
4. Find the trend. Here’s where this whole TSF technique shows its value. What we’re going to do is called “multiple regression.” This whole topic is usually covered in statistics courses under the title of “Correlation and Regression.” I’ve uploaded a fairly detailed description of this topic and its application to TSF to Canvas in a file called “Mgt301-Correlation&Regression”. Read it on your own and ask questions in class about anything you don’t understand.
Now we all know that the equation y = ax + b is a straight line. Very few real-life situations are straight lines. They’re used because it’s a simple equation and back in the day, before cheap and powerful computers, that’s all we could do by hand. Now we can use better equations, but there’s something I need to say before we go any further.
Excel offers a number of built-in regression equations including polynomials up to degree six, logarithmic, exponential, and moving average equations. Why these are almost never appropriate is because time-based business data (like sales) doesn’t change terribly rapidly. Even a second-order polynomial rises and falls more rapidly than real business data. Exponential equations are what is frequently used, but they’re only good for a limited time frame. Any CEO who tells his people and/or stockholders that “our goal is to maintain 5% growth” doesn’t understand basic math. A constant percentage growth rate is an exponential function, like the one you learned in Finance 309: where r is the periodic interest rate and n is the number of periods. If you graph this, it eventually rises literally “exponentially fast.” Exponential growth simply cannot be sustained.
So what we need is an equation that doesn’t rise or fall too rapidly, but does allow for little dips and rises as time goes by. Here is such a function: .
The first term will rise faster than linear (or fall, depending on the sign of a), linear, slower than linear, and it can decay slowly enough to keep the other terms from rising or falling too quickly. The last term is a constant which moves the whole curve up or down without changing its shape.
All we need to do is figure out what the coefficients ought to be, so that our trend equation describes, as accurately as possible, the correlation between sales and time. Here’s how.
We are going to determine how well our data correlates with time. “Time” for us is simply the chronological progression of the data. That means the first time period is “1”, the second is “2”, and so on for as many time periods as you have for your data.
This translates, mathematically, to finding the coefficients of the function as described on the previous page. When you want to find out how to do that, you can Google it, but to do that in a way that Google can figure out what you’re asking you should know that what we’re doing is called “finding a least squares fit of an over-determined non-linear regression equation.” Whew! You can also look through an applied math book for the same thing. When you do either of those things, you’ll find that the solution for the coefficients is given by
c=(XTX)-1XTy.
Not much help, huh? Here’s what it’s saying.
Although this looks like advanced math, involving matrices, their transposes, products, and inverses, its derivation is based on the following. (The transpose of a matrix is found by reversing its rows and columns: the first row becomes the first column of the transpose, the second row becomes the second column, and so on.)
The simple equation, 3x = 7, can be solved using fancy words, like this. “Pre-multiply both sides by the inverse of the coefficient.” That gives 3-1 3x = 3-1 7, or 1/3 3x = 1/3 7, or x = 7/3.OK? Now let’s remember systems of equations from high school, something like this:
This is still fairly easy to solve, right? Now, I can write that system using matrix notation as follows:
(If your high school was like my high school, you remember this; otherwise, trust me.) I can express this in general notation, like this:
where is the coefficient matrix, is the vector of unknowns, and is the right-hand side.
Now this matrix form of the equation is an exact analogue of the original equation, 3x = 7 and the same rules apply for solving it, namely, pre-multiply both sides by the inverse of the coefficient. Doing that gives
,
which is the solution for any system of equations for which the coefficient inverse exists. Something similar produces the equation above, but it’s slightly more complicated because the coefficient matrix isn’t square. Here’s what that means.
In the work we’ll be doing, we usually have more equations than unknowns. You know that we can solve two equations in two unknowns, 3 equations in 3 unknowns, and so on. But we will have, for example, 48 equations in 5 unknowns. This is called an “over-determined” system and the method for solving it is called “Least Squares” minimization. That is, we will find a function that minimizes the total squared distance of each of the points from the function. Its derivation isn’t important for us, but the solution is.
So – we have five terms in the function we think will work well. We need a column of the values for each of them, as follows.
i. First create 5 columns, labeled ““T^(3/2)”, “T”, “T^(1/2)”, “T^– (1/2)”, “Const”. In this case, “T” represents “time” rather than “transpose.”
ii. In each column, insert the formulas =PtNbr^(3/2), =PtNbr, =SQRT(PtNbr), =1/SQRT(PtNbr), 1. Notice that the formula matches the column heading except for the constant term. Copy these formulas down to the bottom of the data and make sure that Excel doesn’t increment the constant term; it should be “1” all the way down. This is the matrix “X” in the equation above.
iii. Name the range of these values “X”. (Highlight only the numbers, not the column headings!)
iv. Name the data values “Y”. (Same thing – only the numbers.)
To make the next instruction clearer, here is the equation again.
c=(XTX)-1XTy.
As usual, we read this equation “from the inside out.” The inner-most term is XTX. That part of the equation is written below in green. After finding that product (“MMULT” means “matrix multiply”) we need to find its inverse. The Excel command for that is MINVERSE, which means “matrix inverse” and is in un-bolded red. Be sure to notice the closing parentheses in matching colors.
The next term we need is XTy. That product is written in bold red. Finally we need to multiply (XTX)-1 by XTy. That’s shown in blue.
This will give us the answer we want, but the numbers will be in a column and we want them to be in a row, so the last operation is to TRANSPOSE that column to a row. That’s shown in a color Excel calls “Red, Accent 2, 40% lighter” and I call “light purple-ish.”
So – finally we’re ready!
v. Highlight one row immediately above the column headings of X and without touching the mouse, enter the following array formula. This is all one formula; it doesn’t fit on this page, but you will write it all in one line in Excel.
=TRANSPOSE(MMULT(MINVERSE(MMULT(TRANSPOSE(T),T)),
MMULT(TRANSPOSE(T),Y))) and press CTRL-SHIFT-ENTER.
These 5 values are the coefficients of the trend equation.Name this range “Coeffs”. Even though the explanation took almost 3 pages, all you have to do is enter the formula; it takes less than 30 seconds.
Once you have the coefficients of the equation of the trend that most closely follows your data, write that equation into an appropriate column, using the “data point number” cells, which you’ve named “PtNbr” as values of t. Let’s assume that X is in columns I through M and that the first row of numbers is row 5. Label column N as “RegrLine”. Then, in cell N6, write the following formula:
=SUMPRODUCT(Coeffs,I5:M5) and copy this equation down through the last data value. Name this range “RegrEqn” or “NonlinearTrend”.
5. If the data is seasonal, add the seasonal effects to the trend line. When I use the word “seasonal,” I’m not implying that the data changes with the seasons the way the weather does. Often, business data does change with the seasons because of holidays, summer vacations, and so on, but “season” actually represents any cyclical pattern that repeats as time passes.
i.Decide on the length of the “season.” How many data points are there before the pattern repeats exactly? If the pattern repeats almost exactly, then that’s not good enough; your data isn’t seasonal. The good news is that doing this step for non-seasonal data doesn’t hurt much. (Pop quiz: Why not?)
ii.Create a Moving Average (MA) of length “season.” This is usually 12 for monthly data, 4 for quarterly data, 5, 6, or 7 for daily data, and so on. Regardless of what’s “usual,” useyour eyes and your brain to find the season length. Here’s how to do that.
The example data has peaks every June and November, and a trough every January,April, and September. We can see this simply by looking at the plot of the data. Since the pattern of higher and lower sales repeats every 12 months, we’ll create a 12-month moving average.
A “moving average” is the average of sales for one full period, which in our case is 12 months. We take a full period average because the pattern repeats every period and because the peaks and troughs will be “averaged out,” i.e., the above-average months and the below-average months, when averaged together, should create a rather smooth line.
The average of the first 12 months is found with a simple AVERAGE function, namely, =AVERAGE(D6:D17) if the actual raw data is in column D. As we “move” the average, the next 12 month period is from D7:D18 (February through January), the next is from D8:D19 (March through February), and so on.
But where do we write this equation? We now have column N that contains the trend equation. Let’s put the moving average in column O. So we’ll label this column (in cell O5) “12-Mo MA”.
The average monthly sales for the first year really ought to be put at the midpoint of the year. The midpoint of the year is after June and before July, but there’s no cell for “after June but before July” so we’ll put the formula in the cell for June, just because.
Now copy the formula down until we get to the last June in the data. The formula at that point should be, for our example data, =AVERAGE(D42:D53). D53 is the last cell that has any data in it.
iii. To make the MA reflect the true midpoint of the year, we need to “center” it if the length of the season is an even number. The value that’s in the first June cell really ought to be between June and July. The value that’s in the first July cell really should be between July and August. So if we find the average of those two cells, we’ll get the correct average for “July.”
So, we’ll create a column for “CMA” (Centered MA). In cell P5, enter the label “CMA.” In the cell for “July” we’ll average the MA values for June and July with the formula =AVERAGE(O11,O12) and copy that down to the last MA value.
iv. Look for “inflection points.” An inflection point is a point where the CMA makes a rather abrupt change of direction. (There’s an illustration of this in the file “Mgt301-Correlation&Regression.”)
The basic assumption of this method is that there have not been any major changes in the business environment at least as far back as your data goes, nor will there be any major changes in the business environment at for the period of the forecast horizon. The CMA is especially valuable for finding such changes.
The method is simple: just graph the raw data and the CMA on the same graph. If there’s an inflection point, you cannot use any of the data that accumulated prior to the inflection point. Pop quiz: Why not?
If you find an inflection point, through away all the data prior to that point and start over with the remaining data. (Don’t really throw it away, because when you get to writing up your results, you’ll need to show the original data with the inflection point before proceeding with the rest of your report.)
v. Calculate the Raw Seasonal Index (RSI). A “seasonal index” is the ratio of actual sales for a particular month compared to the average sales for that year. Because the “average sales” is a moving average, we always get the precise seasonal index for a particular month based on the average for which that particular month is the middle month. (That’s a confusing sentence; you may need to read it several times.)
So – another column. Write the label “RSI” in cell Q5. For the first value for which the CMA exists write the formula =D12/P12 and format it as a percentage. The result will be how much that month’s sales exceed or fall short of the average. Then copy that formula down to the last cell for which the CMA exists.
When you look at the values, you’ll notice that September, for example, always has a value less than 100%. For the example data, the values are in the upper 80’s, which means that September sales are about 86% or 87% of average. June, on the other hand, has values that are 10% to 15% above average.
vi. Calculate the Average Seasonal Index (ASI).This goes in column R. You noticed that the RSI values for a particular month are approximately the same, but not exactly the same. So this step will find the average RSI for all occurrences of the same month.
This is an annoying formula – it is the average of all the RSI values for their particular point in the data. For example, if the data is monthly, it is the average of all the July values of RSI, then all the August values, and so on. You can’t simply copy the formula down because the cell references will advance so that they no longer include data toward the top of the sheet and will include non-existent data at the bottom.
The first RSI value we have in our example data is for July, so the average RSI for all Julys is =AVERAGE(Q12, Q24, Q36). We will copy this formula down for 12 months, if possible. In this case it is possible; that won’t always be the case. The important point is to write a formula that includes all the occurrences of each month.
Once you’ve calculated the ASI for 12 months, you know the ASI for all months. So you can extend the ASI for all the months of data and into the future across the forecast horizon by simply copying the 12 calculated ASI values. Name this range “ASI”.
So, for the first January, the formula in cell R6 is simply =R18. This can be copied down through the first June. From the second July, the formula is =R12 and that can be copied as far into the future as your horizon extends.
6. Add the seasonal effects to the trend line. This is easy – just multiply the trend equation value by the ASI. Put the label “Model” in cell S5, the formula =RegrEqn*ASI in cell S6 and copy down to the last data value.
Now graph the raw data (column D) and the Model data (column S) on the same graph. For our sample data it should look like the graph below (Figure 2). The agreement of the Model and the data is so close that it’s hard to tell which line is which.
7. Extend the model across the forecast horizon. This, too, is easy.
Extend the time periods across the horizon. Our forecast horizon will be one year, so that means you need to extend the numbers in column A from 48 through 60. Highlight the bottom-most equation columns (I53:M53) and copy them down through the horizon. Copy the ASI formula across the horizon. Finally, copy the model formula across the horizon. Name this range “Model”.
When you graph the result (Data and extended Model), you’ll see the forecast. It’s shown on the next page.
Figure 2: Example Data and Model
Figure 3: Example Data and Next Year’s Monthly Forecast
The only thing left to do is to label the graphs nicely. They’ll need a title, axis titles, and actual dates on the x-axis.
In the professional world (but not required for this project) explore other curve-fitting techniques and other forecasting methods. Choose the forecast that best matches your “gut instinct” which is based on your intimate and detailed knowledge of your particular industry and your particular requirements. The method given here, however, is often the best.
Now comes an absolutely critical part of any forecast – an error analysis.
8. Perform various error analyses.
i. First, calculate and plot the errors. The source of errors is always the model; the data can never be wrong, it simply is what it is. So the error is the discrepancy between the data and the model, expressed as a percentage relative to the data. Here’s how.
Yet another column, this time column R. Put the label “PctErr” in cell T5. In cell T6, the formula is =(Model–Data)/Data and format it as a percentage. Then copy the formula down to the bottom of the data. (There’s no way we know what the future error will be until we get the future data.) Name this column “PctErr”.
Create a graph of the errors. If the model has captured all the information that’s in the data, the errors will be small, random, and approximately normally distributed. We need to check each one of those criteria.
ii. Is the error small? “Small” is a subjective judgment. Most people define “small” as meaning “if the average absolute value of the errors is less than 10% then the errors are small.”
That measure of small-ness is called the MAPE, or Mean Absolute Percentage Error. Its formula is =AVERAGE(ABS(PctErr)) and CONTROL-SHIFT-ENTER. Put this formula in cell T4 and the label “MAPE” in cell T3.
iii. Is the error random? “Randomness” is a surprisingly difficult topic. We are hard-wired from hundreds of thousands of years of evolution to recognize patterns. If I asked you to identify which of the following sequences of five digits is random, which would you choose? 1-2-3-4-5; 3-9-2-7;4-4-4-4-4. I’ll bet that most of you chose the center sequence. The answer is that all of them could be random. We have such an automatic response to patterns that it’s hard to believe that 1-2-3-4-5 could possibly be random.
People who’ve studied “random” sequences have come up with two measures of randomness called “runs tests.” We will perform both of them and we hope that our model passes both of them.
The first runs test is the number of “above/below” runs that a sequence has with respect to the average value of the sequence. The idea is that a random sequence will be above average and then change to below average an expected number of times.
The expected number of “above/below” runs for random data is , where n is the number of data points, and the standard deviation of those runs is . If you recall your basic statistics, we can form a 95% confidence interval estimate of the true number of runs from our sample by calculating , where “2” is the real-world’s value for Z or t for 95% confidence. If the number of above/below runs is within that interval, we’ll agree that our errors pass the above/below runs test.
The second runs test is an “up/down” test. A random sequence will change direction an expected number of times. The expected number of “up/down” runs for random data is and . As before we’ll use 95% for our confidence level and the interval becomes . If the number of up/down runs is within that interval, we’ll agree that our errors pass the up/down runs test.
Now, finding the number of above/below and up/down runs is a pain in the neck (and eyes) if you try to count them from the graph. So we’ll get Excel to do it for us.
The technique that Excel uses in its regression tool and which I’ve detailed for you in step 4 (page 8) guarantees that the average error will be zero. That means the end of an above-run (or below-run) occurs when the error plot crosses the x-axis. That means that when the error values change sign, we should count an above/below run.
Now, we can’t know whether the first value crosses the x-axis; it is either above or below. The first possible chance for a sign change is when we look at the second value. Therefore the formula begins in the second row of data. The formula is =IF(SIGN(X6)=SIGN(X7),0,1) and it goes in row 7. (I don’t know that the column will be X – that depends on where you choose to put this column of values.)
Counting up/down runs is a little trickier. Here, we have to count each occurrence of a change in the sign of the slope of the error graph. Be able to tell me why the following is the correct Excel formula and why the first occurrence of the formula is in the third row of data. =IF((X7-X6)*(X8-X7)>0,0,1).
When these formulas have been copied down to the bottom of the data, simply SUM the column to find the number of runs. Then write the formulas for the confidence intervals and you’ll know whether or not your model’s errors pass these randomness tests.
If your error analysis fails one or both of the randomness tests, that means that there’s information in the data that’s not captured by your model. That means that your model can be improved. If you can’t improve your model, make sure that you include an appropriate disclaimer when you write up your results.
iv. Are the errors normally distributed? There are two ways to do this. The first is to use a ?2 (chi, pronounced “ki”, rhymes with “high”, squared) test and the other is to calculate a MAPE for you error distribution relative to a normal distribution. We won’t use either.
All we’re going to do is create a frequency histogram of the errors and say that the errors either are or are not approximately normally distributed. I expect that you’ll tell the truth here and not pretend that the results are better than they are.
9. Establish a confidence bound for your forecast. Again, from basic statistics, you’ll do this by adding a 2?e band (95% confidence limit) to your model and the forecast. The s term is the standard deviation of the errors. Your forecast is now yf? 2?e and you are ready to write up your results.
10. Write-up:
Despite all your work in getting to this point, very little of that work (except the results) will be included in this short (2 – pages) write-up. (This is annoying. You’ve done a lot of work to get these numbers and you don’t get to talk about it. That’s the nature of almost all data analysis work that you’ll do. The boss wants the answer, or at best, the Cliff Notes version of your efforts. The good news is that if you are asked about how you arrived at your forecast, you’ll be on solid ground.)
If any of you have ever been in the Army, you’ve learned that there are three sections to every report or presentation: start by telling them what you’re going to tell them, then tell them, and finish by telling them what you told them. That’s basically what you’re going to do in this write-up. (It’s also very close to the way that a graduate thesis is organized.)
The first thing you need to know to do a write-up like this is who your audience is. I’m your audience. I know what you’ve done. You don’t want to give me a tutorial on time series forecasting; I’ve already been there, done that.
The next thing you want to know is only say what needs to be said. I can’t emphasize this enough. Bosses don’t have a lot of time to read a long memo. (And as of now, I have 63 of these reports to read. Do yourself a huge favor and don’t make me read more than necessary! But – be sure to say all that needs to be said. Writing is easy. Writing well is hard.)
Begin by treating this as a business memo. That means the heading of your memo will be something very much like this:
To: Prof.
From: Student
Date: mm/dd/yyyy
Subject: Forecast of <what you’ve chosen to forecast> for <dates of your forecast horizon>
Your report will have five sections. Each will be a bold section heading. Here they are.
Introduction:
This section introduces the paper, not the topic. It includes where you got the data, and a brief description of what will follow. For example, “The data for XXX can be found at XXX.com/YYY/zzz. (When I click on this link, I will go directly to your data; I don’t want to have to navigate through some web site.) First I will describe the method used and the assumptions made in this analysis. Next I will present the actual data and the model. Then, I will present both a graphical and tabular forecast and an error analysis. Finally, I will present my recommendations. The details can be found in the Appendix.”This is all that’s required of the Introduction.
The last sentence, about a detailed Appendix, is optional. If you include it, I probably won’t read it, you won’t get extra points for including it, but could lose points if you choose to include it and don’t do it well. I’ve included it here because some bosses want to see it, or at least be able to see it if they want or need to check your reasoning.
Method and Assumptions:
This section says that you will use TSF. The assumptions are those of the TSF method and whether or not you made any adjustments and why you made them. The accuracy of the data, i.e., the reliability of the source and/or how it was measured is also an assumption.
Results:
This is where you show me the results of your work. Include the following graphs: the raw data by itself; the raw data with the model extended across the forecast horizon and with the 2s error bands added to the extended trend. You also will include a table of your forecasted values plus/minus the error bands, nicely labeled, and the trend equation, professionally written. (See the file “Mgt301-EquationsInWord” for instructions on how to do this if you’re not familiar with the Equation Editor in Word.)
Your discussion should focus on the behavior of the data through time. Resist the temptation to guess why the data moved as it did unless you’re prepared to justify your reasoning. (It is highly unlikely that you are prepared to discuss the reasons why unfamiliar data moved as it did.) Keep in mind that TSF is not a causal forecasting method; it simply says, “This is what to expect in the next X time periods because all the factors that affected the data, whatever they were, are assumed to remain about the same as they’ve been in the past.”
Error Analysis:
What are the weak points in your model? What is the value of the MAPE? Are the errors random (did they pass the runs test)? What were the numeric results of the runs tests? What were the numeric values of the confidence limits of the runs tests? What does the distribution of the errors look like? Are the 2? error bands uselessly wide or are they pretty tight?
Here is where you tell the truth, even if the truth is not what you or the boss want to hear. If your model has a large MAPE, if it failed one or both runs tests, if the errors are not normally distributed, if you have no confidence in the model – say it! And then say that this method doesn’t seem to be appropriate for your data and that other methods should be explored.
You will include a graph of the errors and the error distribution, nicely labeled.
An optional, but highly useful, addition to this section is a “forecast of a known future.” What this means is that you re-do the whole forecast from the beginning, but leaving out the most recent data, and forecast the most recent data. How well does this method work when you know what the results ought to be?
Recommendation(s):
The only reason for doing something like this is to use it as the basis for managerial action. What do you want the reader (now it’s your boss) to do with this data? Resist the temptation to introduce new topics in this section!
Appendix:
This optional. It is so optional that I probably won’t read it, and you’ll gain no additional points for including it. If you decide to include an appendix, this is all the other stuff that you did in order to write this report. Any references to these tables and/or graphs in the body of the text must refer to Figure numbers, Table numbers, page numbers, and like that in the main body of the text.
What follows is a sample report that I wrote some time ago. It shows the kind of write-up that I’m looking for.
Is Global Warming Local?
Robert Todd, Ph.D.
University of Bridgeport
Introduction:
This report will present data from the National Oceanographic and Atmospheric Administration (NOAA) for the State of Connecticut from 1895 through 2004. The most variable month (January) and the least variable month (July) are compared and the next ten years for those months will be forecast, including a 95% confidence interval.
The first part of the report will be a discussion of the assumptions made. Following that, the results will be presented, along with a tabular forecast. Next, an analysis of the errors is presented. Finally, a conclusion is drawn as to whether global warming is affecting Connecticut.
Assumptions:
The method used is time series forecasting. Its primary assumption is that whatever has been happening since 1895 will continue to happen through at least 2014.
It is not known how the average temperatures were determined – whether they represent the average midday high at one location, or the average daily highs over several cities, or the daily average from one or several cities. Further, the concept of “degree-days” and even “degree-hours” may or may not have been included. The same range of daily temperatures could represent considerably colder or warmer days, depending on how much time passed at the daily high or the daily low. Still further, global warming can occur if either the daily high or the daily low is rising. It is assumed that the scientists at NOAA have decided which method is most consistent and most accurately represents the climate changes in Connecticut.
Although some years were hotter or colder than others, none of the data points represented an outlier. There were no missing data points that would have required an approximation. Finally, there were no consistent cycles present in the data, so that no de-cycling procedures were necessary.
Discussion and Results:
The idea that global warming is occurring is widespread with many media outlets providing dramatic warnings for a dire future. The data for the last 110 years in Connecticut give some credence to this phenomenon, but the evidence is hardly overwhelming. For example, the warmest Januarys occurred in 1932 and 1913, and the warmest Julys occurred in 1913 and 1912, respectively. If there were a consistent warming trend, one might expect that the warmest years would be closer to the present. Similarly, the second coldest January occurred in 1962, and 2004 had the 4th coldest January. Figure 1 shows the data.
Figure 1: January and July Temperatures in Connecticut for 1895 – 2004.
When a trend is fitted to both these lines, Figure 2 is the result.
Figure 2: January and July Temperature trends in Connecticut for 1895 – 2004.
It can be seen that July is beginning to warm up, but only in the second half of the 20th century. It’s a little difficult to see in the graph, but July got steadily cooler from 1907 through 1947 and has been slowly warming over the last 60 years. Only in 1992 did July reach the level of warmth that occurred in 1907. Similarly, January was warmest, on average, in 1922, and has only in 2002 returned to that level of warmth.
The trend equation for January is
The positive value of the first term’s coefficient means that eventually, y will rise sharply. In particular, by 2114 this trend predicts an average January temperature of 57 degrees Fahrenheit. This is, I hope, quite unrealistic.
The trend equation for July is
The negative value of the first term indicates that eventually, y will fall off. Its smaller value means that the cooling of July will be far into the future. In fact, by 2114, the trend equation predicts just about the same temperature as today.
The 95% confidence intervals for both months are indicated by the “2?” bands immediately above and below the trend lines.
Here is the actual forecast.
Year Jan-Low Jan Jan-High Jul-Low Jul Jul-High
2005 18.12 26.87 35.61 68.07 71.68 75.28
2006 18.22 26.96 35.71 68.08 71.69 75.30
2007 18.32 27.07 35.81 68.10 71.71 75.32
2008 18.42 27.17 35.92 68.11 71.72 75.33
2009 18.53 27.28 36.02 68.13 71.74 75.35
2010 18.64 27.39 36.14 68.14 71.75 75.36
2011 18.76 27.50 36.25 68.16 71.77 75.38
2012 18.87 27.62 36.37 68.17 71.78 75.39
2013 18.99 27.74 36.49 68.19 71.80 75.41
2014 19.12 27.87 36.61 68.20 71.81 75.42
Table 1: Ten-year forecast for July and January for Connecticut, 2005 – 2014.
As can be seen, the model predicts that the January temperature will rise about 1 degree over the next ten years, but July will rise only about a tenth of a degree.
Error Analysis:
If a model has captured all the information contained in the data, the errors will be small, random, and approximately normally distributed. The errors for both January and July are shown in figures 3 and 4. Although both graphs look random, it’s necessary that they both pass the “runs test.” In both cases, the errors proved to be random.
Both graphs are shown at the same scale. The MAPE for January is 14.2%, which isn’t bad, but not as small as would give confidence in the model’s prediction. For July, however, the MAPE is less than 2% (1.93%), which gives us more confidence.
The smaller errors for July are reflected also in the distribution of errors. Figure 5 shows that, for both months, the errors are approximately normally distributed.
Figure 3: Model/Data Errors for January
Figure 4: Model/Data Errors for July
Figure 5: Error distribution for January and July
Conclusion:
The model developed for forecasting the next ten years’ average temperature for January and July in Connecticut seems to indicate that the overall warming trend is a winter phenomenon, at least in Connecticut. The model predicts no appreciable rise for July for the next 100 years, while the rise in temperature for January is much more substantial.
Based on the small MAPE for July, more credence should be placed on the July forecast than on the January forecast. That being said, worldwide data does seem to support the notion that winter temperatures are rising faster than summer temperatures. Before this conclusion can be generalized, however, each month of the year should be subjected to a similar analysis.
The confidence one should place on either of these models, however, should be tempered by the knowledge that global warming and cooling has been taking place over billions of years, and that we find ourselves today in an “interglacial period” of warming while we wait for the next Ice Age.
Potential Penalties: (This is not part of your report.)
Failure to name your files correctly will result in a one letter grade penalty. If your documents are late, by even one minute, there will be a one letter grade penalty for each 24 hours that it’s late. (It will be the clock on my computer that determines when the documents arrive.)
avrage 4.552697674
MAX 4.1% EX # 22.5 28.33333333
STD 0.236943185 MIN -4.4% stdv 4.582575695 2.705960499
M-M 8.5% uper limt 31.66515139 33.74525433
M-M/5 1.7% lower limt 13.33484861 22.92141234
MAPE
0.03739047 -0.490050663 2.053586429 2.47482259 1.193719235 1.41% 7.98% 18 28
MonthNp Year Month Potato chips sales Potato chips FC-Data outlier? L.TREND Dis T^(3/2) T T^1/2 T^(1/2) CONST RegrLine 12-Mo MA CMA RSI ASI Model PctErr Error A/B Runs U/D Runs Lable Bins Freq. Month LowLim Forcast UpLim
1 2010 01 374.651 2010 01 4.651 4.651 4.651 5.264 OUTLIER 4.898 0.366 1 1 1 1 1 5.27 99% 5.27 0% 1% -2.5% -4.5% 0 Jan-16 4.379752867 4.54 4.699104907
2 2010 02 494·561 2010 02 4·561 4.561 4.561 5.029 4.913 0.116 2.828427125 2 1.414213562 0.707106781 1 4.97 100% 4.97 -1% -6% 1 -1.5% -2.5% 1 Feb-16 4.413595015 4.57 4.732947056
3 2010 03 704.570 2010 03 4.570 4.570 4.57 4.851 4.928 0.077 5.196152423 3 1.732050808 0.577350269 1 4.90 101% 4.90 1% 5% 1 1 -0.5% -0.5% 17 Mar-16 4.450053375 4.61 4.769405415
4 2010 04 44·461 2010 04 4·461 4.461 4.461 4.785 4.943 0.158 8 4 2 0.5 1 4.88 100% 4.88 2% 9% 0 0 0.5% 1.5% 16 Apr-16 4.489130115 4.65 4.808482155
5 2010 05 424·594 2010 05 4·594 4.594 4.594 4.938 4.958 0.02 11.18033989 5 2.236067977 0.447213595 1 4.86 100% 4.86 -2% -8% 1 0 1.5% 3.5% 7 May-16 4.530826488 4.69 4.850178528
6 2010 06 384·706 2010 06 4·706 4.706 4.706 4.868 4.973 0.105 14.69693846 6 2.449489743 0.40824829 1 4.84 4.8675 100% 4.84 -1% -2% 0 0 2.5% 5.5% 2 Jun-16 4.575142913 4.73 4.894494954
7 2010 07 114·659 2010 07 4·659 4.659 4.659 4.713 4.988 0.275 18.52025918 7 2.645751311 0.377964473 1 4.82 4.804416667 4.84 97% 99% 4.82 2% 11% 1 0 More More 0 Jul-16 4.622079058 4.78 4.941431098
8 2010 08 734·665 2010 08 4·665 4.665 4.665 4.793 5.003 0.21 22.627417 8 2.828427125 0.353553391 1 4.80 4.776 4.79 100% 99% 4.80 0% 1% 0 0 Aug-16 4.671633902 4.83 4.990985942
9 2010 09 794·631 2010 09 4·631 4.631 4.631 4.878 5.018 0.14 27 9 3 0.333333333 1 4.78 4.753166667 4.76 102% 100% 4.78 -2% -10% 1 1 Sep-16 4.723805804 4.88 5.043157845
10 2010 10 44·770 2010 10 4·770 4.770 4.77 4.97 5.033 0.063 31.6227766 10 3.16227766 0.316227766 1 4.75 4.73175 4.74 105% 103% 4.75 -4% -22% 0 1 Oct-16 4.778592558 4.94 5.097944599
11 2010 11 444·689 2010 11 4·689 4.689 4.689 4.811 5.048 0.237 36.48287269 11 3.31662479 0.301511345 1 4.72 4.701583333 4.72 102% 101% 4.72 -2% -9% 0 1 Nov-16 4.835991446 5.00 5.155343486
12 2010 12 444·742 2010 12 4·742 4.742 4.742 4.51 5.063 0.553 41.56921938 12 3.464101615 0.288675135 1 4.70 4.679333333 4.69 96% 99% 4.70 4% 19% 1 0 Dec-16 4.895999282 5.06 5.215351322
13 2011 01 494·790 2011 01 4·790 4.790 4.79 4.507 5.078 0.571 46.87216658 13 3.605551275 0.277350098 1 4.67 4.664583333 4.67 96% 99% 4.67 4% 16% 0 1
14 2011 02 794.724 2011 02 4.724 4.724 4.724 4.688 5.093 0.405 52.38320341 14 3.741657387 0.267261242 1 4.64 4.63275 4.65 101% 100% 4.64 -1% -5% 1 1
15 2011 03 334·837 2011 03 4·837 4.837 4.837 4.577 5.108 0.531 58.09475019 15 3.872983346 0.25819889 1 4.61 4.593833333 4.61 99% 101% 4.61 1% 3% 1 1
16 2011 04 134·850 2011 04 4·850 4.850 4.85 4.528 5.123 0.595 64 16 4 0.25 1 4.58 4.557833333 4.58 99% 100% 4.58 1% 5% 0 0
17 2011 05 784·944 2011 05 4·944 4.944 4.944 4.576 5.138 0.562 70.09279564 17 4.123105626 0.242535625 1 4.55 4.525166667 4.54 101% 100% 4.55 -1% -2% 1 1
18 2011 06 205.038 2011 06 5.038 5.038 5.038 4.601 5.153 0.552 76.36753237 18 4.242640687 0.23570226 1 4.52 4.513666667 4.52 102% 100% 4.52 -2% -8% 0 1
19 2011 07 645.052 2011 07 5.052 5.052 5.052 4.536 5.168 0.632 82.81907993 19 4.358898944 0.229415734 1 4.50 4.500583333 4.51 101% 99% 4.50 -1% -4% 0 1
20 2011 08 975·185 2011 08 5·185 5.185 5.185 4.411 5.183 0.772 89.4427191 20 4.472135955 0.223606798 1 4.47 4.468916667 4.48 98% 98% 4.47 1% 6% 1 0
21 2011 09 165·036 2011 09 5·036 5.036 5.036 4.411 5.198 0.787 96.23408959 21 4.582575695 0.21821789 1 4.45 4.45725 4.46 99% 99% 4.45 1% 4% 0 0
22 2011 10 85.111 2011 10 5.111 5.111 5.111 4.538 5.213 0.675 103.1891467 22 4.69041576 0.213200716 1 4.43 4.443 4.45 102% 101% 4.43 -2% -11% 1 0
23 2011 11 525·015 2011 11 5·015 5.015 5.015 4.419 5.228 0.809 110.304125 23 4.795831523 0.208514414 1 4.41 4.422333333 4.43 100% 100% 4.41 0% -1% 0 0
24 2011 12 415·032 2011 12 5·032 5.032 5.032 4.372 5.243 0.871 117.5755077 24 4.898979486 0.204124145 1 4.39 4.392833333 4.41 99% 100% 4.39 1% 2% 1 1
25 2012 01 384.995 2012 01 4.995 4.995 4.995 4.35 5.258 0.908 125 25 5 0.2 1 4.38 4.370666667 4.38 99% 100% 4.38 1% 3% 0 1
26 2012 02 515·091 2012 02 5·091 5.091 5.091 4.308 5.273 0.965 132.5745074 26 5.099019514 0.196116135 1 4.37 4.358333333 4.36 99% 99% 4.37 1% 6% 0 1
27 2012 03 345·087 2012 03 5·087 5.087 5.087 4.437 5.288 0.851 140.2961154 27 5.196152423 0.19245009 1 4.36 4.348916667 4.35 102% 99% 4.36 -2% -8% 1 0
28 2012 04 145.010 2012 04 5.010 5.010 5.01 4.357 5.303 0.946 148.1620734 28 5.291502622 0.188982237 1 4.35 4.337416667 4.34 100% 100% 4.35 0% -1% 0 0
29 2012 05 455.200 2012 05 5.200 5.200 5.2 4.328 5.318 0.99 156.1697794 29 5.385164807 0.185695338 1 4.34 4.336833333 4.34 100% 103% 4.34 0% 1% 1 1
30 2012 06 795·264 2012 06 5·264 5.264 5.264 4.247 5.333 1.086 164.3167673 30 5.477225575 0.182574186 1 4.34 4.341416667 4.34 98% 101% 4.34 2% 9% 0 1
31 2012 07 725.029 2012 07 5.029 5.029 5.029 4.27 5.348 1.078 172.6006952 31 5.567764363 0.179605302 1 4.33 4.349083333 4.35 98% 99% 4.33 2% 6% 0 1
32 2012 08 114.851 2012 08 4.851 4.851 4.851 4.263 5.363 1.1 181.019336 32 5.656854249 0.176776695 1 4.33 4.363416667 4.36 98% 99% 4.33 2% 7% 0 1
33 2012 09 624·785 2012 09 4·785 4.785 4.785 4.298 5.378 1.08 189.5705673 33 5.744562647 0.174077656 1 4.34 4.36125 4.36 99% 100% 4.34 1% 4% 0 1
34 2012 10 494·938 2012 10 4·938 4.938 4.938 4.4 5.393 0.993 198.2523644 34 5.830951895 0.171498585 1 4.34 4.3735 4.37 101% 101% 4.34 -1% -6% 1 1
35 2012 11 554·868 2012 11 4·868 4.868 4.868 4.412 5.408 0.996 207.0627924 35 5.916079783 0.169030851 1 4.35 4.37575 4.37 101% 100% 4.35 -1% -6% 0 1
36 2012 12 504.713 2012 12 4.713 4.713 4.713 4.427 5.423 0.996 216 36 6 0.166666667 1 4.36 4.396 4.39 101% 100% 4.36 -1% -6% 0 1
37 2013 01 394·793 2013 01 4·793 4.793 4.793 4.442 5.438 0.996 225.0622136 37 6.08276253 0.164398987 1 4.38 4.407916667 4.40 101% 100% 4.38 -2% -7% 0 1
38 2013 02 424·878 2013 02 4·878 4.878 4.878 4.48 5.453 0.973 234.2477321 38 6.164414003 0.162221421 1 4.39 99% 4.39 -2% -9% 0 1
39 2013 03 954.970 2013 03 4.970 4.970 4.97 4.411 5.468 1.057 243.5549219 39 6.244997998 0.160128154 1 4.41 98% 4.41 0% 0% 0 1
40 2013 04 754·811 2013 04 4·811 4.811 4.811 4.504 5.483 0.979 252.9822128 40 6.32455532 0.158113883 1 4.43 99% 4.43 -2% -7% 0 1
41 2013 05 524.510 2013 05 4.510 4.510 4.51 4.355 5.498 1.143 262.5280937 41 6.403124237 0.156173762 1 4.45 101% 4.45 2% 10% 1 1
42 2013 06 674·507 2013 06 4·507 4.507 4.507 4.49 5.513 1.023 272.1911093 42 6.480740698 0.15430335 1 4.48 100% 4.48 0% -1% 1 1
43 2013 07 314.688 2013 07 4.688 4.688 4.688 4.413 5.528 1.115 281.9698565 43 6.557438524 0.15249857 1 4.51 100% 4.51 2% 10% 1 1
44 2013 08 564.577 2013 08 4.577 4.577 4.577 291.8629816 44 6.633249581 0.150755672 1 4.54 100% 4.54
45 2013 09 24·528 2013 09 4·528 4.528 4.528 301.869177 45 6.708203932 0.149071198 1 4.57 99% 4.57
46 2013 10 534.576 2013 10 4.576 4.576 4.576 311.9871792 46 6.782329983 0.147441956 1 4.61 99% 4.61
47 2013 11 264·601 2013 11 4·601 4.601 4.601 322.2157662 47 6.8556546 0.145864991 1 4.65 100% 4.65
48 2013 12 874.536 2013 12 4.536 4.536 4.536 332.5537551 48 6.92820323 0.144337567 1 4.69 103% 4.69
49 2014 01 134.411 2014 01 4.411 4.411 4.411 343 49 7 0.142857143 1 4.73 101% 4.73
50 2014 02 594.411 2014 02 4.411 4.411 4.411 353.5533906 50 7.071067812 0.141421356 1 4.78 99% 4.78
51 2014 03 664.538 2014 03 4.538 4.538 4.538 364.2128499 51 7.141428429 0.140028008 1 4.83 99% 4.83
52 2014 04 874.419 2014 04 4.419 4.419 4.419 374.9773326 52 7.211102551 0.138675049 1 4.88 100% 4.88
53 2014 05 364·372 2014 05 4·372 4.372 4.372 385.8458241 53 7.280109889 0.137360564 1 4.94 101% 4.94
54 2014 06 644·350 2014 06 4·350 4.350 4.35 396.8173383 54 7.348469228 0.136082763 1 5.00 100% 5.00
55 2014 07 664.308 2014 07 4.308 4.308 4.308 407.8909168 55 7.416198487 0.134839972 1 5.06 100% 5.06
56 2014 08 04·437 2014 08 4·437 4.437 4.437
57 2014 09 464.357 2014 09 4.357 4.357 4.357
58 2014 10 774.328 2014 10 4.328 4.328 4.328
59 2014 11 634.247 2014 11 4.247 4.247 4.247
60 2014 12 964·270 2014 12 4·270 4.270 4.27
61 2015 01 454·263 2015 01 4·263 4.263 4.263
62 2015 02 294.298 2015 02 4.298 4.298 4.298
63 2015 03 194·400 2015 03 4·400 4.400 4.4
64 2015 04 684·412 2015 04 4·412 4.412 4.412
65 2015 05 564.427 2015 05 4.427 4.427 4.427
66 2015 06 754·442 2015 06 4·442 4.442 4.442
67 2015 07 234.480 2015 07 4.480 4.480 4.48
68 2015 08 354.411 2015 08 4.411 4.411 4.411
69 2015 09 324.504 2015 09 4.504 4.504 4.504
70 2015 10 994·355 2015 10 4·355 4.355 4.355
71 2015 11 884.490 2015 11 4.490 4.490 4.49
72 2015 12 424·413 2015 12 4·413 4.413 4.413
72 2016 1
72 2016 2
72 2016 3
72 2016 4
72 2016 5
72 2016 6
72 2016 7
72 2016 8
72 2016 9
72 2016 10
72 2016 11
72 2016 12
Forecasting
Forecasting
Here are some facts about forecasting. The first two are true for just about every business bigger than a kid’s curbside lemonade stand.
1. Everything in business is driven by “the forecast.”
2. Forecasts are almost always wrong.
3. Forecasts are either quantitative or qualitative. Quantitative forecasts are causal, explanatory, or time series. Qualitative forecasts are based on judgment and expert opinion. Both quantitative and qualitative forecasts are numerical; it is the method – not the result – that warrants the name.
Causal: Explains why the variable of interest changes.
Explanatory: Explains that the variable of interest changes when one or more other variables change, and by how much, but not why.
Time Series: Doesn’t explain anything; it just says “If there have been no major changes in the business environment lately and if you don’t expect any major changes in the near future, this is what’s coming. More or less.”
4. Forecasts are either aggregate or itemized.
Aggregate forecasts tend to be more accurate, because errors tend to cancel each other out. Suppose we accurately forecast enrollment for UB for next Fall. The CFO will praise us, because we got the tuition revenue exactly right. But suppose we have twice as many engineering students show up as we expected, and half as many liberal arts students. Our planning, staffing and scheduling is now worthless.
Itemized forecasts can be itemized at a variety of levels. Continuing the UB example, we can forecast enrollment for, say, the School of Business, the Accounting Department, or for Accounting 101. The more finely we focus our attention, the less accurate the forecast is likely to be, because we’re dealing with smaller and smaller samples.
5. Limitations of forecasts.
Some things just can’t be forecasted, at least not with anything even close to useful accuracy. Natural phenomena, like earthquakes and hurricanes, are simply not well enough understood to forecast.Economic phenomena, like booms and busts, or even ordinary economic indicators like the DJIA or unemployment, are notoriously resistant to forecasting techniques.
6. The Forecast “Horizon.”
One of the distinguishing characteristics of a forecast is its horizon – the farther the horizon extends into the future, the less accurate the forecast. That means “the farther you go beyond the region where you have data, the less accurate the forecast will be.”
Local weather forecasters are pretty good at forecasting 4 or 5 days out, but anything more than two weeks into the future is pretty much a guess. The same is true for business forecasts. Short horizons, say, up to a year, can be quite good, but any forecast beyond that is a crap shoot. It is also necessary to know the forecast error. How much “wiggle room” is included? Saying that next summer will be warmer than winter isn’t terribly helpful.
Time Series Forecasting (TSF):
Basic Assumptions:
1. The future will be an extension of the past. Everything that’s been happening will continue to happen, at least across the forecast horizon. (If this assumption isn’t warranted, then this technique is worthless.)
2. All the causes of variation in the variable of interest are imbedded in the data. The causes of variation, whatever they are, caused the variable of interest to do what it did. Therefore we don’t need to know what those causes are, just that they will continue as they have in the past.
TSF compared with other methods:
Multiple Regression (MR): You choose the variables on which you think the variable of interest depends. You form a linear relationship among the variables and the computer determines the coefficients, like this:
y = ax1 + bx2 + … + kxn + C
Here, y is the variable of interest, say, next month’s sales, and the x’s represent what we believe to be dependent variables, say, the amount of money we spend on advertising, current inventory levels, and so on. Plug in the values of the dependent variables, choose the coefficients, and out comes next month’s sales – in theory.
Assumptions of MR:
1. The variable of interest, y, is dependent on the xi’s.
2. The variable of interest depends on all of them and does not depend on anything else.
3. Although the usual formulation requires that the variable of interest be linearly dependent on each of the independent variables, non-linear relationships can be included.
Problems:
1. If the assumptions are wrong, the forecast will be less valid.
2. If you forgot an important variable, the forecast is less valid and less accurate.
Advantages of TSF viz. MR:
1. You cannot leave out an important variable, or include an unimportant variable.
2. In terms of implementation, Excel’s Regression tool is static. That means you have to re-run it each time you make a change. (We will generate the important outputs of Excel’s built-in tool so that they’ll respond immediately to any changes.)
Disadvantages of TSF viz MR:
1. You gain no insight into the causes of variation.
2. Occasionally, summed periodic dependencies appear to be aperiodic. In ordinary American this means that if your sales are subject to seasonal variation as well as longer term business cycles, the combination of both effects may mask any apparent regularity and appear to be basically random.
Moving Averages (MA):
1. Moving Averages are simple, but they do not account for periodic variation and they are lagging indicators.
2. They are only valid for “stationary” series (i.e., no trend, no cycles). Note: This is what books say. Moving averages will follow consistent trends very nicely, and depending on the period you choose, can “see through” cyclical or seasonal data.
3. MA places equal weight on all data values, regardless of how long ago they occurred, but there are variations on this technique (see next section) that preferentially emphasize more recent data and can account for cyclical variation. But this technique is still a lagging indicator.
Exponential Smoothing (ES):
Exponential smoothing is like a moving average but it places more weight on more recent values and less and less weight on values from the past. If F(t) is the function value for time period t, A(t) is the actual value for time period t, then the forecast f(t+1) for the next period is given by f(t+1) = F(t) = ?A(t) + (1 – ?)F(t – 1) where ? is a constant between 0 and 1.
1. ES is simple, but depends on your choice of ?, and is a lagging indicator.
2. ES is valid only for stationary trends.
3. ES can be extended to account for linear trends. When it is, it’s called “Holt’s Method.” If you’re interested, you can Google this.
4. ES can be extended to account for linear trends and seasonal variation. When it is, it’s called “Winter’s Method.” You can also Google this.
5. TSF does all this (and more) at least as easily. Further, TSF can relatively easily account for non-linear trends and multiple cycles “at no additional cost.”
Time Series Forecasting (TSF):
A “time series” is a set of equally spaced observations of the variable of interest made over time.In general, the method that follows does the following.
First we will “clean up” the data by removing outliers, if any. Then we will compute a non-linear trend line and extend that line into the future across the forecast horizon. Next we’ll calculate any seasonal effects and add them to the trend.
This method works better than any other for non-random time series. Unfortunately, most economic indicators (especially the stock market) are pretty much random. Fortunately, most business data (like sales, revenue, costs) are much less random.
TSF – The Details:
1. Plot the data, y(t) vs. t. What does it look like? The variable of interest (hereafter, just “y(t)”) changes as time goes by. If y(t) exhibits a pattern, it is an ideal candidate for TSF. The pattern can be complex, perhaps including both seasonal and cyclical variations on top of variable rates of rising and declining, but is nevertheless a pattern.
If there is no detectable pattern, and the time series is essentially random, only the most general kind of forecast is possible. These tend to be much less helpful.
Figure 1. Sample Monthly Sales Data
Our example data looks like what’s shown in Figure 1: 48 months of sales data, and our job is to forecast next year’s sales, by month. When we look at this data, it looks a little like a seismograph recording. There is a general trend of rising sales, but at first glance, there’s little else that seems at all helpful and forecasting next year’s monthly sales seems impossible.
But if we look closer, there seem to be some peaks and valleys that might occur regularly. Using Excel to identify which months have the highest sales, and our eyes to look at the actual data, we notice that the 6th month and the 18th month have noticeable peaks, and that the 9th month and the 21st month have noticeable valleys, or troughs. Then we notice that the peaks are 12 months apart, as are the troughs. Further examination shows that this pattern repeats across all 48 months. Oh, goody! That’s lucky for us, as we’ll soon enough see.
2. Remove outliers, if any. Outliers are the values of y(t) that represent rare, non-repeating occurrences. Statistically, outliers are points more than ?3? from the mean. A better method for recognizing outliers is to remove points that are more than ?3? from the trend-line.
A word of warning is necessary here. The computer will find all the points that are outside the standard statistical requirement for being an outlier. These points may or may not actually be outliers and you will have to use your judgment. An example is jewelry sales in the U.S. Jewelry sales are fairly constant throughout the year but peak dramatically in December. I’ve seen data for jewelry sales and the computer invariably says that all the December sales are outliers. Statistically, they are outliers, but that’s the nature of the business. The key phrase for determining which points are outliers is “rare, non-repeating” points.
Excel formula for identifying outliers: Name the range containing the data, “Data”. Then, in some cell above that range, type =AVERAGE(Data). Name that cell “Avg”. In the cell just below the average, type =STDEV.S(Data). Name that cell “StdDev”. In the column next to the data, type =IF(ABS(Data-Avg)>3*StdDev,“OUTLIER”,“”). Pop quiz: Why do we need the absolute value (ABS) function?
A better formula (trend-based outliers):
i. In column A, number the data values as 1,2,3, etc. Name that range “PtNbr” or “Period”.
ii. With the chart selected:Chart Tools ? Design ? Add Chart Element?Trendline ? More Trendline Options ?“Linear” and “Display Formula”) ?Close.(LClick on formula and drag it to where you can see it.) [You can also LClick on the “+” sign at the top right corner of the graph and go directly to “Trendline.”]
iii. Copy formula into col C, using data point numbers in col A as “x” values as follows:
=2.54*PtNbr+643.1 (for example) and copy down. Name this range “LinearTrend”. Of course, you will use the real coefficients based on your data.Pop Quiz: What do the constants represent, in business terms? The answer is NOT “slope” and “intercept”! While that’s correct for your algebra class, we need to understand what the numbers mean in business terms.
iv. Create column D as the difference between the linear formula and the data values.
=Data-LinearTrend and copy down. Name this range “Diff”.
v. Find the standard deviation of the differences and check them for outliers.Pop quiz: Why is this better?
Outliers are replaced as follows: yt = ½(yt-1 + yt+1). In the rare case where there are two outliers together, yo1 = 2/3ya + 1/3 yb and yo2=1/3ya + 2/3 yb. These formulas are what you will find in reference books. It is an extremely marketable skill to be able to translate formulas from books into Excel. The key to doing this is to understand what each symbol represents and where those values are on the spreadsheet. In this case, yt represents the outlier value at time t, and yt-1 and yt+1 represent the data values just before and just after the outlier value.
There is a caveat here – if yt is at a peak (or trough) of the time series, you need to use your judgment.
3. Replace missing values. Use the same method as for removing outliers.
4. Find the trend. Here’s where this whole TSF technique shows its value. What we’re going to do is called “multiple regression.” This whole topic is usually covered in statistics courses under the title of “Correlation and Regression.” I’ve uploaded a fairly detailed description of this topic and its application to TSF to Canvas in a file called “Mgt301-Correlation&Regression”. Read it on your own and ask questions in class about anything you don’t understand.
Now we all know that the equation y = ax + b is a straight line. Very few real-life situations are straight lines. They’re used because it’s a simple equation and back in the day, before cheap and powerful computers, that’s all we could do by hand. Now we can use better equations, but there’s something I need to say before we go any further.
Excel offers a number of built-in regression equations including polynomials up to degree six, logarithmic, exponential, and moving average equations. Why these are almost never appropriate is because time-based business data (like sales) doesn’t change terribly rapidly. Even a second-order polynomial rises and falls more rapidly than real business data. Exponential equations are what is frequently used, but they’re only good for a limited time frame. Any CEO who tells his people and/or stockholders that “our goal is to maintain 5% growth” doesn’t understand basic math. A constant percentage growth rate is an exponential function, like the one you learned in Finance 309: where r is the periodic interest rate and n is the number of periods. If you graph this, it eventually rises literally “exponentially fast.” Exponential growth simply cannot be sustained.
So what we need is an equation that doesn’t rise or fall too rapidly, but does allow for little dips and rises as time goes by. Here is such a function: .
The first term will rise faster than linear (or fall, depending on the sign of a), linear, slower than linear, and it can decay slowly enough to keep the other terms from rising or falling too quickly. The last term is a constant which moves the whole curve up or down without changing its shape.
All we need to do is figure out what the coefficients ought to be, so that our trend equation describes, as accurately as possible, the correlation between sales and time. Here’s how.
We are going to determine how well our data correlates with time. “Time” for us is simply the chronological progression of the data. That means the first time period is “1”, the second is “2”, and so on for as many time periods as you have for your data.
This translates, mathematically, to finding the coefficients of the function as described on the previous page. When you want to find out how to do that, you can Google it, but to do that in a way that Google can figure out what you’re asking you should know that what we’re doing is called “finding a least squares fit of an over-determined non-linear regression equation.” Whew! You can also look through an applied math book for the same thing. When you do either of those things, you’ll find that the solution for the coefficients is given by
c=(XTX)-1XTy.
Not much help, huh? Here’s what it’s saying.
Although this looks like advanced math, involving matrices, their transposes, products, and inverses, its derivation is based on the following. (The transpose of a matrix is found by reversing its rows and columns: the first row becomes the first column of the transpose, the second row becomes the second column, and so on.)
The simple equation, 3x = 7, can be solved using fancy words, like this. “Pre-multiply both sides by the inverse of the coefficient.” That gives 3-1 3x = 3-1 7, or 1/3 3x = 1/3 7, or x = 7/3.OK? Now let’s remember systems of equations from high school, something like this:
This is still fairly easy to solve, right? Now, I can write that system using matrix notation as follows:
(If your high school was like my high school, you remember this; otherwise, trust me.) I can express this in general notation, like this:
where is the coefficient matrix, is the vector of unknowns, and is the right-hand side.
Now this matrix form of the equation is an exact analogue of the original equation, 3x = 7 and the same rules apply for solving it, namely, pre-multiply both sides by the inverse of the coefficient. Doing that gives
,
which is the solution for any system of equations for which the coefficient inverse exists. Something similar produces the equation above, but it’s slightly more complicated because the coefficient matrix isn’t square. Here’s what that means.
In the work we’ll be doing, we usually have more equations than unknowns. You know that we can solve two equations in two unknowns, 3 equations in 3 unknowns, and so on. But we will have, for example, 48 equations in 5 unknowns. This is called an “over-determined” system and the method for solving it is called “Least Squares” minimization. That is, we will find a function that minimizes the total squared distance of each of the points from the function. Its derivation isn’t important for us, but the solution is.
So – we have five terms in the function we think will work well. We need a column of the values for each of them, as follows.
i. First create 5 columns, labeled ““T^(3/2)”, “T”, “T^(1/2)”, “T^– (1/2)”, “Const”. In this case, “T” represents “time” rather than “transpose.”
ii. In each column, insert the formulas =PtNbr^(3/2), =PtNbr, =SQRT(PtNbr), =1/SQRT(PtNbr), 1. Notice that the formula matches the column heading except for the constant term. Copy these formulas down to the bottom of the data and make sure that Excel doesn’t increment the constant term; it should be “1” all the way down. This is the matrix “X” in the equation above.
iii. Name the range of these values “X”. (Highlight only the numbers, not the column headings!)
iv. Name the data values “Y”. (Same thing – only the numbers.)
To make the next instruction clearer, here is the equation again.
c=(XTX)-1XTy.
As usual, we read this equation “from the inside out.” The inner-most term is XTX. That part of the equation is written below in green. After finding that product (“MMULT” means “matrix multiply”) we need to find its inverse. The Excel command for that is MINVERSE, which means “matrix inverse” and is in un-bolded red. Be sure to notice the closing parentheses in matching colors.
The next term we need is XTy. That product is written in bold red. Finally we need to multiply (XTX)-1 by XTy. That’s shown in blue.
This will give us the answer we want, but the numbers will be in a column and we want them to be in a row, so the last operation is to TRANSPOSE that column to a row. That’s shown in a color Excel calls “Red, Accent 2, 40% lighter” and I call “light purple-ish.”
So – finally we’re ready!
v. Highlight one row immediately above the column headings of X and without touching the mouse, enter the following array formula. This is all one formula; it doesn’t fit on this page, but you will write it all in one line in Excel.
=TRANSPOSE(MMULT(MINVERSE(MMULT(TRANSPOSE(T),T)),
MMULT(TRANSPOSE(T),Y))) and press CTRL-SHIFT-ENTER.
These 5 values are the coefficients of the trend equation.Name this range “Coeffs”. Even though the explanation took almost 3 pages, all you have to do is enter the formula; it takes less than 30 seconds.
Once you have the coefficients of the equation of the trend that most closely follows your data, write that equation into an appropriate column, using the “data point number” cells, which you’ve named “PtNbr” as values of t. Let’s assume that X is in columns I through M and that the first row of numbers is row 5. Label column N as “RegrLine”. Then, in cell N6, write the following formula:
=SUMPRODUCT(Coeffs,I5:M5) and copy this equation down through the last data value. Name this range “RegrEqn” or “NonlinearTrend”.
5. If the data is seasonal, add the seasonal effects to the trend line. When I use the word “seasonal,” I’m not implying that the data changes with the seasons the way the weather does. Often, business data does change with the seasons because of holidays, summer vacations, and so on, but “season” actually represents any cyclical pattern that repeats as time passes.
i.Decide on the length of the “season.” How many data points are there before the pattern repeats exactly? If the pattern repeats almost exactly, then that’s not good enough; your data isn’t seasonal. The good news is that doing this step for non-seasonal data doesn’t hurt much. (Pop quiz: Why not?)
ii.Create a Moving Average (MA) of length “season.” This is usually 12 for monthly data, 4 for quarterly data, 5, 6, or 7 for daily data, and so on. Regardless of what’s “usual,” useyour eyes and your brain to find the season length. Here’s how to do that.
The example data has peaks every June and November, and a trough every January,April, and September. We can see this simply by looking at the plot of the data. Since the pattern of higher and lower sales repeats every 12 months, we’ll create a 12-month moving average.
A “moving average” is the average of sales for one full period, which in our case is 12 months. We take a full period average because the pattern repeats every period and because the peaks and troughs will be “averaged out,” i.e., the above-average months and the below-average months, when averaged together, should create a rather smooth line.
The average of the first 12 months is found with a simple AVERAGE function, namely, =AVERAGE(D6:D17) if the actual raw data is in column D. As we “move” the average, the next 12 month period is from D7:D18 (February through January), the next is from D8:D19 (March through February), and so on.
But where do we write this equation? We now have column N that contains the trend equation. Let’s put the moving average in column O. So we’ll label this column (in cell O5) “12-Mo MA”.
The average monthly sales for the first year really ought to be put at the midpoint of the year. The midpoint of the year is after June and before July, but there’s no cell for “after June but before July” so we’ll put the formula in the cell for June, just because.
Now copy the formula down until we get to the last June in the data. The formula at that point should be, for our example data, =AVERAGE(D42:D53). D53 is the last cell that has any data in it.
iii. To make the MA reflect the true midpoint of the year, we need to “center” it if the length of the season is an even number. The value that’s in the first June cell really ought to be between June and July. The value that’s in the first July cell really should be between July and August. So if we find the average of those two cells, we’ll get the correct average for “July.”
So, we’ll create a column for “CMA” (Centered MA). In cell P5, enter the label “CMA.” In the cell for “July” we’ll average the MA values for June and July with the formula =AVERAGE(O11,O12) and copy that down to the last MA value.
iv. Look for “inflection points.” An inflection point is a point where the CMA makes a rather abrupt change of direction. (There’s an illustration of this in the file “Mgt301-Correlation&Regression.”)
The basic assumption of this method is that there have not been any major changes in the business environment at least as far back as your data goes, nor will there be any major changes in the business environment at for the period of the forecast horizon. The CMA is especially valuable for finding such changes.
The method is simple: just graph the raw data and the CMA on the same graph. If there’s an inflection point, you cannot use any of the data that accumulated prior to the inflection point. Pop quiz: Why not?
If you find an inflection point, through away all the data prior to that point and start over with the remaining data. (Don’t really throw it away, because when you get to writing up your results, you’ll need to show the original data with the inflection point before proceeding with the rest of your report.)
v. Calculate the Raw Seasonal Index (RSI). A “seasonal index” is the ratio of actual sales for a particular month compared to the average sales for that year. Because the “average sales” is a moving average, we always get the precise seasonal index for a particular month based on the average for which that particular month is the middle month. (That’s a confusing sentence; you may need to read it several times.)
So – another column. Write the label “RSI” in cell Q5. For the first value for which the CMA exists write the formula =D12/P12 and format it as a percentage. The result will be how much that month’s sales exceed or fall short of the average. Then copy that formula down to the last cell for which the CMA exists.
When you look at the values, you’ll notice that September, for example, always has a value less than 100%. For the example data, the values are in the upper 80’s, which means that September sales are about 86% or 87% of average. June, on the other hand, has values that are 10% to 15% above average.
vi. Calculate the Average Seasonal Index (ASI).This goes in column R. You noticed that the RSI values for a particular month are approximately the same, but not exactly the same. So this step will find the average RSI for all occurrences of the same month.
This is an annoying formula – it is the average of all the RSI values for their particular point in the data. For example, if the data is monthly, it is the average of all the July values of RSI, then all the August values, and so on. You can’t simply copy the formula down because the cell references will advance so that they no longer include data toward the top of the sheet and will include non-existent data at the bottom.
The first RSI value we have in our example data is for July, so the average RSI for all Julys is =AVERAGE(Q12, Q24, Q36). We will copy this formula down for 12 months, if possible. In this case it is possible; that won’t always be the case. The important point is to write a formula that includes all the occurrences of each month.
Once you’ve calculated the ASI for 12 months, you know the ASI for all months. So you can extend the ASI for all the months of data and into the future across the forecast horizon by simply copying the 12 calculated ASI values. Name this range “ASI”.
So, for the first January, the formula in cell R6 is simply =R18. This can be copied down through the first June. From the second July, the formula is =R12 and that can be copied as far into the future as your horizon extends.
6. Add the seasonal effects to the trend line. This is easy – just multiply the trend equation value by the ASI. Put the label “Model” in cell S5, the formula =RegrEqn*ASI in cell S6 and copy down to the last data value.
Now graph the raw data (column D) and the Model data (column S) on the same graph. For our sample data it should look like the graph below (Figure 2). The agreement of the Model and the data is so close that it’s hard to tell which line is which.
7. Extend the model across the forecast horizon. This, too, is easy.
Extend the time periods across the horizon. Our forecast horizon will be one year, so that means you need to extend the numbers in column A from 48 through 60. Highlight the bottom-most equation columns (I53:M53) and copy them down through the horizon. Copy the ASI formula across the horizon. Finally, copy the model formula across the horizon. Name this range “Model”.
When you graph the result (Data and extended Model), you’ll see the forecast. It’s shown on the next page.
Figure 2: Example Data and Model
Figure 3: Example Data and Next Year’s Monthly Forecast
The only thing left to do is to label the graphs nicely. They’ll need a title, axis titles, and actual dates on the x-axis.
In the professional world (but not required for this project) explore other curve-fitting techniques and other forecasting methods. Choose the forecast that best matches your “gut instinct” which is based on your intimate and detailed knowledge of your particular industry and your particular requirements. The method given here, however, is often the best.
Now comes an absolutely critical part of any forecast – an error analysis.
8. Perform various error analyses.
i. First, calculate and plot the errors. The source of errors is always the model; the data can never be wrong, it simply is what it is. So the error is the discrepancy between the data and the model, expressed as a percentage relative to the data. Here’s how.
Yet another column, this time column R. Put the label “PctErr” in cell T5. In cell T6, the formula is =(Model–Data)/Data and format it as a percentage. Then copy the formula down to the bottom of the data. (There’s no way we know what the future error will be until we get the future data.) Name this column “PctErr”.
Create a graph of the errors. If the model has captured all the information that’s in the data, the errors will be small, random, and approximately normally distributed. We need to check each one of those criteria.
ii. Is the error small? “Small” is a subjective judgment. Most people define “small” as meaning “if the average absolute value of the errors is less than 10% then the errors are small.”
That measure of small-ness is called the MAPE, or Mean Absolute Percentage Error. Its formula is =AVERAGE(ABS(PctErr)) and CONTROL-SHIFT-ENTER. Put this formula in cell T4 and the label “MAPE” in cell T3.
iii. Is the error random? “Randomness” is a surprisingly difficult topic. We are hard-wired from hundreds of thousands of years of evolution to recognize patterns. If I asked you to identify which of the following sequences of five digits is random, which would you choose? 1-2-3-4-5; 3-9-2-7;4-4-4-4-4. I’ll bet that most of you chose the center sequence. The answer is that all of them could be random. We have such an automatic response to patterns that it’s hard to believe that 1-2-3-4-5 could possibly be random.
People who’ve studied “random” sequences have come up with two measures of randomness called “runs tests.” We will perform both of them and we hope that our model passes both of them.
The first runs test is the number of “above/below” runs that a sequence has with respect to the average value of the sequence. The idea is that a random sequence will be above average and then change to below average an expected number of times.
The expected number of “above/below” runs for random data is , where n is the number of data points, and the standard deviation of those runs is . If you recall your basic statistics, we can form a 95% confidence interval estimate of the true number of runs from our sample by calculating , where “2” is the real-world’s value for Z or t for 95% confidence. If the number of above/below runs is within that interval, we’ll agree that our errors pass the above/below runs test.
The second runs test is an “up/down” test. A random sequence will change direction an expected number of times. The expected number of “up/down” runs for random data is and . As before we’ll use 95% for our confidence level and the interval becomes . If the number of up/down runs is within that interval, we’ll agree that our errors pass the up/down runs test.
Now, finding the number of above/below and up/down runs is a pain in the neck (and eyes) if you try to count them from the graph. So we’ll get Excel to do it for us.
The technique that Excel uses in its regression tool and which I’ve detailed for you in step 4 (page 8) guarantees that the average error will be zero. That means the end of an above-run (or below-run) occurs when the error plot crosses the x-axis. That means that when the error values change sign, we should count an above/below run.
Now, we can’t know whether the first value crosses the x-axis; it is either above or below. The first possible chance for a sign change is when we look at the second value. Therefore the formula begins in the second row of data. The formula is =IF(SIGN(X6)=SIGN(X7),0,1) and it goes in row 7. (I don’t know that the column will be X – that depends on where you choose to put this column of values.)
Counting up/down runs is a little trickier. Here, we have to count each occurrence of a change in the sign of the slope of the error graph. Be able to tell me why the following is the correct Excel formula and why the first occurrence of the formula is in the third row of data. =IF((X7-X6)*(X8-X7)>0,0,1).
When these formulas have been copied down to the bottom of the data, simply SUM the column to find the number of runs. Then write the formulas for the confidence intervals and you’ll know whether or not your model’s errors pass these randomness tests.
If your error analysis fails one or both of the randomness tests, that means that there’s information in the data that’s not captured by your model. That means that your model can be improved. If you can’t improve your model, make sure that you include an appropriate disclaimer when you write up your results.
iv. Are the errors normally distributed? There are two ways to do this. The first is to use a ?2 (chi, pronounced “ki”, rhymes with “high”, squared) test and the other is to calculate a MAPE for you error distribution relative to a normal distribution. We won’t use either.
All we’re going to do is create a frequency histogram of the errors and say that the errors either are or are not approximately normally distributed. I expect that you’ll tell the truth here and not pretend that the results are better than they are.
9. Establish a confidence bound for your forecast. Again, from basic statistics, you’ll do this by adding a 2?e band (95% confidence limit) to your model and the forecast. The s term is the standard deviation of the errors. Your forecast is now yf? 2?e and you are ready to write up your results.
10. Write-up:
Despite all your work in getting to this point, very little of that work (except the results) will be included in this short (2 – pages) write-up. (This is annoying. You’ve done a lot of work to get these numbers and you don’t get to talk about it. That’s the nature of almost all data analysis work that you’ll do. The boss wants the answer, or at best, the Cliff Notes version of your efforts. The good news is that if you are asked about how you arrived at your forecast, you’ll be on solid ground.)
If any of you have ever been in the Army, you’ve learned that there are three sections to every report or presentation: start by telling them what you’re going to tell them, then tell them, and finish by telling them what you told them. That’s basically what you’re going to do in this write-up. (It’s also very close to the way that a graduate thesis is organized.)
The first thing you need to know to do a write-up like this is who your audience is. I’m your audience. I know what you’ve done. You don’t want to give me a tutorial on time series forecasting; I’ve already been there, done that.
The next thing you want to know is only say what needs to be said. I can’t emphasize this enough. Bosses don’t have a lot of time to read a long memo. (And as of now, I have 63 of these reports to read. Do yourself a huge favor and don’t make me read more than necessary! But – be sure to say all that needs to be said. Writing is easy. Writing well is hard.)
Begin by treating this as a business memo. That means the heading of your memo will be something very much like this:
To: Prof.
From: Student
Date: mm/dd/yyyy
Subject: Forecast of <what you’ve chosen to forecast> for <dates of your forecast horizon>
Your report will have five sections. Each will be a bold section heading. Here they are.
Introduction:
This section introduces the paper, not the topic. It includes where you got the data, and a brief description of what will follow. For example, “The data for XXX can be found at XXX.com/YYY/zzz. (When I click on this link, I will go directly to your data; I don’t want to have to navigate through some web site.) First I will describe the method used and the assumptions made in this analysis. Next I will present the actual data and the model. Then, I will present both a graphical and tabular forecast and an error analysis. Finally, I will present my recommendations. The details can be found in the Appendix.”This is all that’s required of the Introduction.
The last sentence, about a detailed Appendix, is optional. If you include it, I probably won’t read it, you won’t get extra points for including it, but could lose points if you choose to include it and don’t do it well. I’ve included it here because some bosses want to see it, or at least be able to see it if they want or need to check your reasoning.
Method and Assumptions:
This section says that you will use TSF. The assumptions are those of the TSF method and whether or not you made any adjustments and why you made them. The accuracy of the data, i.e., the reliability of the source and/or how it was measured is also an assumption.
Results:
This is where you show me the results of your work. Include the following graphs: the raw data by itself; the raw data with the model extended across the forecast horizon and with the 2s error bands added to the extended trend. You also will include a table of your forecasted values plus/minus the error bands, nicely labeled, and the trend equation, professionally written. (See the file “Mgt301-EquationsInWord” for instructions on how to do this if you’re not familiar with the Equation Editor in Word.)
Your discussion should focus on the behavior of the data through time. Resist the temptation to guess why the data moved as it did unless you’re prepared to justify your reasoning. (It is highly unlikely that you are prepared to discuss the reasons why unfamiliar data moved as it did.) Keep in mind that TSF is not a causal forecasting method; it simply says, “This is what to expect in the next X time periods because all the factors that affected the data, whatever they were, are assumed to remain about the same as they’ve been in the past.”
Error Analysis:
What are the weak points in your model? What is the value of the MAPE? Are the errors random (did they pass the runs test)? What were the numeric results of the runs tests? What were the numeric values of the confidence limits of the runs tests? What does the distribution of the errors look like? Are the 2? error bands uselessly wide or are they pretty tight?
Here is where you tell the truth, even if the truth is not what you or the boss want to hear. If your model has a large MAPE, if it failed one or both runs tests, if the errors are not normally distributed, if you have no confidence in the model – say it! And then say that this method doesn’t seem to be appropriate for your data and that other methods should be explored.
You will include a graph of the errors and the error distribution, nicely labeled.
An optional, but highly useful, addition to this section is a “forecast of a known future.” What this means is that you re-do the whole forecast from the beginning, but leaving out the most recent data, and forecast the most recent data. How well does this method work when you know what the results ought to be?
Recommendation(s):
The only reason for doing something like this is to use it as the basis for managerial action. What do you want the reader (now it’s your boss) to do with this data? Resist the temptation to introduce new topics in this section!
Appendix:
This optional. It is so optional that I probably won’t read it, and you’ll gain no additional points for including it. If you decide to include an appendix, this is all the other stuff that you did in order to write this report. Any references to these tables and/or graphs in the body of the text must refer to Figure numbers, Table numbers, page numbers, and like that in the main body of the text.
What follows is a sample report that I wrote some time ago. It shows the kind of write-up that I’m looking for.
Is Global Warming Local?
Robert Todd, Ph.D.
University of Bridgeport
Introduction:
This report will present data from the National Oceanographic and Atmospheric Administration (NOAA) for the State of Connecticut from 1895 through 2004. The most variable month (January) and the least variable month (July) are compared and the next ten years for those months will be forecast, including a 95% confidence interval.
The first part of the report will be a discussion of the assumptions made. Following that, the results will be presented, along with a tabular forecast. Next, an analysis of the errors is presented. Finally, a conclusion is drawn as to whether global warming is affecting Connecticut.
Assumptions:
The method used is time series forecasting. Its primary assumption is that whatever has been happening since 1895 will continue to happen through at least 2014.
It is not known how the average temperatures were determined – whether they represent the average midday high at one location, or the average daily highs over several cities, or the daily average from one or several cities. Further, the concept of “degree-days” and even “degree-hours” may or may not have been included. The same range of daily temperatures could represent considerably colder or warmer days, depending on how much time passed at the daily high or the daily low. Still further, global warming can occur if either the daily high or the daily low is rising. It is assumed that the scientists at NOAA have decided which method is most consistent and most accurately represents the climate changes in Connecticut.
Although some years were hotter or colder than others, none of the data points represented an outlier. There were no missing data points that would have required an approximation. Finally, there were no consistent cycles present in the data, so that no de-cycling procedures were necessary.
Discussion and Results:
The idea that global warming is occurring is widespread with many media outlets providing dramatic warnings for a dire future. The data for the last 110 years in Connecticut give some credence to this phenomenon, but the evidence is hardly overwhelming. For example, the warmest Januarys occurred in 1932 and 1913, and the warmest Julys occurred in 1913 and 1912, respectively. If there were a consistent warming trend, one might expect that the warmest years would be closer to the present. Similarly, the second coldest January occurred in 1962, and 2004 had the 4th coldest January. Figure 1 shows the data.
Figure 1: January and July Temperatures in Connecticut for 1895 – 2004.
When a trend is fitted to both these lines, Figure 2 is the result.
Figure 2: January and July Temperature trends in Connecticut for 1895 – 2004.
It can be seen that July is beginning to warm up, but only in the second half of the 20th century. It’s a little difficult to see in the graph, but July got steadily cooler from 1907 through 1947 and has been slowly warming over the last 60 years. Only in 1992 did July reach the level of warmth that occurred in 1907. Similarly, January was warmest, on average, in 1922, and has only in 2002 returned to that level of warmth.
The trend equation for January is
The positive value of the first term’s coefficient means that eventually, y will rise sharply. In particular, by 2114 this trend predicts an average January temperature of 57 degrees Fahrenheit. This is, I hope, quite unrealistic.
The trend equation for July is
The negative value of the first term indicates that eventually, y will fall off. Its smaller value means that the cooling of July will be far into the future. In fact, by 2114, the trend equation predicts just about the same temperature as today.
The 95% confidence intervals for both months are indicated by the “2?” bands immediately above and below the trend lines.
Here is the actual forecast.
Year Jan-Low Jan Jan-High Jul-Low Jul Jul-High
2005 18.12 26.87 35.61 68.07 71.68 75.28
2006 18.22 26.96 35.71 68.08 71.69 75.30
2007 18.32 27.07 35.81 68.10 71.71 75.32
2008 18.42 27.17 35.92 68.11 71.72 75.33
2009 18.53 27.28 36.02 68.13 71.74 75.35
2010 18.64 27.39 36.14 68.14 71.75 75.36
2011 18.76 27.50 36.25 68.16 71.77 75.38
2012 18.87 27.62 36.37 68.17 71.78 75.39
2013 18.99 27.74 36.49 68.19 71.80 75.41
2014 19.12 27.87 36.61 68.20 71.81 75.42
Table 1: Ten-year forecast for July and January for Connecticut, 2005 – 2014.
As can be seen, the model predicts that the January temperature will rise about 1 degree over the next ten years, but July will rise only about a tenth of a degree.
Error Analysis:
If a model has captured all the information contained in the data, the errors will be small, random, and approximately normally distributed. The errors for both January and July are shown in figures 3 and 4. Although both graphs look random, it’s necessary that they both pass the “runs test.” In both cases, the errors proved to be random.
Both graphs are shown at the same scale. The MAPE for January is 14.2%, which isn’t bad, but not as small as would give confidence in the model’s prediction. For July, however, the MAPE is less than 2% (1.93%), which gives us more confidence.
The smaller errors for July are reflected also in the distribution of errors. Figure 5 shows that, for both months, the errors are approximately normally distributed.
Figure 3: Model/Data Errors for January
Figure 4: Model/Data Errors for July
Figure 5: Error distribution for January and July
Conclusion:
The model developed for forecasting the next ten years’ average temperature for January and July in Connecticut seems to indicate that the overall warming trend is a winter phenomenon, at least in Connecticut. The model predicts no appreciable rise for July for the next 100 years, while the rise in temperature for January is much more substantial.
Based on the small MAPE for July, more credence should be placed on the July forecast than on the January forecast. That being said, worldwide data does seem to support the notion that winter temperatures are rising faster than summer temperatures. Before this conclusion can be generalized, however, each month of the year should be subjected to a similar analysis.
The confidence one should place on either of these models, however, should be tempered by the knowledge that global warming and cooling has been taking place over billions of years, and that we find ourselves today in an “interglacial period” of warming while we wait for the next Ice Age.
Potential Penalties: (This is not part of your report.)
Failure to name your files correctly will result in a one letter grade penalty. If your documents are late, by even one minute, there will be a one letter grade penalty for each 24 hours that it’s late. (It will be the clock on my computer that determines when the documents arrive.)
avrage 4.552697674
MAX 4.1% EX # 22.5 28.33333333
STD 0.236943185 MIN -4.4% stdv 4.582575695 2.705960499
M-M 8.5% uper limt 31.66515139 33.74525433
M-M/5 1.7% lower limt 13.33484861 22.92141234
MAPE
0.03739047 -0.490050663 2.053586429 2.47482259 1.193719235 1.41% 7.98% 18 28
MonthNp Year Month Potato chips sales Potato chips FC-Data outlier? L.TREND Dis T^(3/2) T T^1/2 T^(1/2) CONST RegrLine 12-Mo MA CMA RSI ASI Model PctErr Error A/B Runs U/D Runs Lable Bins Freq. Month LowLim Forcast UpLim
1 2010 01 374.651 2010 01 4.651 4.651 4.651 5.264 OUTLIER 4.898 0.366 1 1 1 1 1 5.27 99% 5.27 0% 1% -2.5% -4.5% 0 Jan-16 4.379752867 4.54 4.699104907
2 2010 02 494·561 2010 02 4·561 4.561 4.561 5.029 4.913 0.116 2.828427125 2 1.414213562 0.707106781 1 4.97 100% 4.97 -1% -6% 1 -1.5% -2.5% 1 Feb-16 4.413595015 4.57 4.732947056
3 2010 03 704.570 2010 03 4.570 4.570 4.57 4.851 4.928 0.077 5.196152423 3 1.732050808 0.577350269 1 4.90 101% 4.90 1% 5% 1 1 -0.5% -0.5% 17 Mar-16 4.450053375 4.61 4.769405415
4 2010 04 44·461 2010 04 4·461 4.461 4.461 4.785 4.943 0.158 8 4 2 0.5 1 4.88 100% 4.88 2% 9% 0 0 0.5% 1.5% 16 Apr-16 4.489130115 4.65 4.808482155
5 2010 05 424·594 2010 05 4·594 4.594 4.594 4.938 4.958 0.02 11.18033989 5 2.236067977 0.447213595 1 4.86 100% 4.86 -2% -8% 1 0 1.5% 3.5% 7 May-16 4.530826488 4.69 4.850178528
6 2010 06 384·706 2010 06 4·706 4.706 4.706 4.868 4.973 0.105 14.69693846 6 2.449489743 0.40824829 1 4.84 4.8675 100% 4.84 -1% -2% 0 0 2.5% 5.5% 2 Jun-16 4.575142913 4.73 4.894494954
7 2010 07 114·659 2010 07 4·659 4.659 4.659 4.713 4.988 0.275 18.52025918 7 2.645751311 0.377964473 1 4.82 4.804416667 4.84 97% 99% 4.82 2% 11% 1 0 More More 0 Jul-16 4.622079058 4.78 4.941431098
8 2010 08 734·665 2010 08 4·665 4.665 4.665 4.793 5.003 0.21 22.627417 8 2.828427125 0.353553391 1 4.80 4.776 4.79 100% 99% 4.80 0% 1% 0 0 Aug-16 4.671633902 4.83 4.990985942
9 2010 09 794·631 2010 09 4·631 4.631 4.631 4.878 5.018 0.14 27 9 3 0.333333333 1 4.78 4.753166667 4.76 102% 100% 4.78 -2% -10% 1 1 Sep-16 4.723805804 4.88 5.043157845
10 2010 10 44·770 2010 10 4·770 4.770 4.77 4.97 5.033 0.063 31.6227766 10 3.16227766 0.316227766 1 4.75 4.73175 4.74 105% 103% 4.75 -4% -22% 0 1 Oct-16 4.778592558 4.94 5.097944599
11 2010 11 444·689 2010 11 4·689 4.689 4.689 4.811 5.048 0.237 36.48287269 11 3.31662479 0.301511345 1 4.72 4.701583333 4.72 102% 101% 4.72 -2% -9% 0 1 Nov-16 4.835991446 5.00 5.155343486
12 2010 12 444·742 2010 12 4·742 4.742 4.742 4.51 5.063 0.553 41.56921938 12 3.464101615 0.288675135 1 4.70 4.679333333 4.69 96% 99% 4.70 4% 19% 1 0 Dec-16 4.895999282 5.06 5.215351322
13 2011 01 494·790 2011 01 4·790 4.790 4.79 4.507 5.078 0.571 46.87216658 13 3.605551275 0.277350098 1 4.67 4.664583333 4.67 96% 99% 4.67 4% 16% 0 1
14 2011 02 794.724 2011 02 4.724 4.724 4.724 4.688 5.093 0.405 52.38320341 14 3.741657387 0.267261242 1 4.64 4.63275 4.65 101% 100% 4.64 -1% -5% 1 1
15 2011 03 334·837 2011 03 4·837 4.837 4.837 4.577 5.108 0.531 58.09475019 15 3.872983346 0.25819889 1 4.61 4.593833333 4.61 99% 101% 4.61 1% 3% 1 1
16 2011 04 134·850 2011 04 4·850 4.850 4.85 4.528 5.123 0.595 64 16 4 0.25 1 4.58 4.557833333 4.58 99% 100% 4.58 1% 5% 0 0
17 2011 05 784·944 2011 05 4·944 4.944 4.944 4.576 5.138 0.562 70.09279564 17 4.123105626 0.242535625 1 4.55 4.525166667 4.54 101% 100% 4.55 -1% -2% 1 1
18 2011 06 205.038 2011 06 5.038 5.038 5.038 4.601 5.153 0.552 76.36753237 18 4.242640687 0.23570226 1 4.52 4.513666667 4.52 102% 100% 4.52 -2% -8% 0 1
19 2011 07 645.052 2011 07 5.052 5.052 5.052 4.536 5.168 0.632 82.81907993 19 4.358898944 0.229415734 1 4.50 4.500583333 4.51 101% 99% 4.50 -1% -4% 0 1
20 2011 08 975·185 2011 08 5·185 5.185 5.185 4.411 5.183 0.772 89.4427191 20 4.472135955 0.223606798 1 4.47 4.468916667 4.48 98% 98% 4.47 1% 6% 1 0
21 2011 09 165·036 2011 09 5·036 5.036 5.036 4.411 5.198 0.787 96.23408959 21 4.582575695 0.21821789 1 4.45 4.45725 4.46 99% 99% 4.45 1% 4% 0 0
22 2011 10 85.111 2011 10 5.111 5.111 5.111 4.538 5.213 0.675 103.1891467 22 4.69041576 0.213200716 1 4.43 4.443 4.45 102% 101% 4.43 -2% -11% 1 0
23 2011 11 525·015 2011 11 5·015 5.015 5.015 4.419 5.228 0.809 110.304125 23 4.795831523 0.208514414 1 4.41 4.422333333 4.43 100% 100% 4.41 0% -1% 0 0
24 2011 12 415·032 2011 12 5·032 5.032 5.032 4.372 5.243 0.871 117.5755077 24 4.898979486 0.204124145 1 4.39 4.392833333 4.41 99% 100% 4.39 1% 2% 1 1
25 2012 01 384.995 2012 01 4.995 4.995 4.995 4.35 5.258 0.908 125 25 5 0.2 1 4.38 4.370666667 4.38 99% 100% 4.38 1% 3% 0 1
26 2012 02 515·091 2012 02 5·091 5.091 5.091 4.308 5.273 0.965 132.5745074 26 5.099019514 0.196116135 1 4.37 4.358333333 4.36 99% 99% 4.37 1% 6% 0 1
27 2012 03 345·087 2012 03 5·087 5.087 5.087 4.437 5.288 0.851 140.2961154 27 5.196152423 0.19245009 1 4.36 4.348916667 4.35 102% 99% 4.36 -2% -8% 1 0
28 2012 04 145.010 2012 04 5.010 5.010 5.01 4.357 5.303 0.946 148.1620734 28 5.291502622 0.188982237 1 4.35 4.337416667 4.34 100% 100% 4.35 0% -1% 0 0
29 2012 05 455.200 2012 05 5.200 5.200 5.2 4.328 5.318 0.99 156.1697794 29 5.385164807 0.185695338 1 4.34 4.336833333 4.34 100% 103% 4.34 0% 1% 1 1
30 2012 06 795·264 2012 06 5·264 5.264 5.264 4.247 5.333 1.086 164.3167673 30 5.477225575 0.182574186 1 4.34 4.341416667 4.34 98% 101% 4.34 2% 9% 0 1
31 2012 07 725.029 2012 07 5.029 5.029 5.029 4.27 5.348 1.078 172.6006952 31 5.567764363 0.179605302 1 4.33 4.349083333 4.35 98% 99% 4.33 2% 6% 0 1
32 2012 08 114.851 2012 08 4.851 4.851 4.851 4.263 5.363 1.1 181.019336 32 5.656854249 0.176776695 1 4.33 4.363416667 4.36 98% 99% 4.33 2% 7% 0 1
33 2012 09 624·785 2012 09 4·785 4.785 4.785 4.298 5.378 1.08 189.5705673 33 5.744562647 0.174077656 1 4.34 4.36125 4.36 99% 100% 4.34 1% 4% 0 1
34 2012 10 494·938 2012 10 4·938 4.938 4.938 4.4 5.393 0.993 198.2523644 34 5.830951895 0.171498585 1 4.34 4.3735 4.37 101% 101% 4.34 -1% -6% 1 1
35 2012 11 554·868 2012 11 4·868 4.868 4.868 4.412 5.408 0.996 207.0627924 35 5.916079783 0.169030851 1 4.35 4.37575 4.37 101% 100% 4.35 -1% -6% 0 1
36 2012 12 504.713 2012 12 4.713 4.713 4.713 4.427 5.423 0.996 216 36 6 0.166666667 1 4.36 4.396 4.39 101% 100% 4.36 -1% -6% 0 1
37 2013 01 394·793 2013 01 4·793 4.793 4.793 4.442 5.438 0.996 225.0622136 37 6.08276253 0.164398987 1 4.38 4.407916667 4.40 101% 100% 4.38 -2% -7% 0 1
38 2013 02 424·878 2013 02 4·878 4.878 4.878 4.48 5.453 0.973 234.2477321 38 6.164414003 0.162221421 1 4.39 99% 4.39 -2% -9% 0 1
39 2013 03 954.970 2013 03 4.970 4.970 4.97 4.411 5.468 1.057 243.5549219 39 6.244997998 0.160128154 1 4.41 98% 4.41 0% 0% 0 1
40 2013 04 754·811 2013 04 4·811 4.811 4.811 4.504 5.483 0.979 252.9822128 40 6.32455532 0.158113883 1 4.43 99% 4.43 -2% -7% 0 1
41 2013 05 524.510 2013 05 4.510 4.510 4.51 4.355 5.498 1.143 262.5280937 41 6.403124237 0.156173762 1 4.45 101% 4.45 2% 10% 1 1
42 2013 06 674·507 2013 06 4·507 4.507 4.507 4.49 5.513 1.023 272.1911093 42 6.480740698 0.15430335 1 4.48 100% 4.48 0% -1% 1 1
43 2013 07 314.688 2013 07 4.688 4.688 4.688 4.413 5.528 1.115 281.9698565 43 6.557438524 0.15249857 1 4.51 100% 4.51 2% 10% 1 1
44 2013 08 564.577 2013 08 4.577 4.577 4.577 291.8629816 44 6.633249581 0.150755672 1 4.54 100% 4.54
45 2013 09 24·528 2013 09 4·528 4.528 4.528 301.869177 45 6.708203932 0.149071198 1 4.57 99% 4.57
46 2013 10 534.576 2013 10 4.576 4.576 4.576 311.9871792 46 6.782329983 0.147441956 1 4.61 99% 4.61
47 2013 11 264·601 2013 11 4·601 4.601 4.601 322.2157662 47 6.8556546 0.145864991 1 4.65 100% 4.65
48 2013 12 874.536 2013 12 4.536 4.536 4.536 332.5537551 48 6.92820323 0.144337567 1 4.69 103% 4.69
49 2014 01 134.411 2014 01 4.411 4.411 4.411 343 49 7 0.142857143 1 4.73 101% 4.73
50 2014 02 594.411 2014 02 4.411 4.411 4.411 353.5533906 50 7.071067812 0.141421356 1 4.78 99% 4.78
51 2014 03 664.538 2014 03 4.538 4.538 4.538 364.2128499 51 7.141428429 0.140028008 1 4.83 99% 4.83
52 2014 04 874.419 2014 04 4.419 4.419 4.419 374.9773326 52 7.211102551 0.138675049 1 4.88 100% 4.88
53 2014 05 364·372 2014 05 4·372 4.372 4.372 385.8458241 53 7.280109889 0.137360564 1 4.94 101% 4.94
54 2014 06 644·350 2014 06 4·350 4.350 4.35 396.8173383 54 7.348469228 0.136082763 1 5.00 100% 5.00
55 2014 07 664.308 2014 07 4.308 4.308 4.308 407.8909168 55 7.416198487 0.134839972 1 5.06 100% 5.06
56 2014 08 04·437 2014 08 4·437 4.437 4.437
57 2014 09 464.357 2014 09 4.357 4.357 4.357
58 2014 10 774.328 2014 10 4.328 4.328 4.328
59 2014 11 634.247 2014 11 4.247 4.247 4.247
60 2014 12 964·270 2014 12 4·270 4.270 4.27
61 2015 01 454·263 2015 01 4·263 4.263 4.263
62 2015 02 294.298 2015 02 4.298 4.298 4.298
63 2015 03 194·400 2015 03 4·400 4.400 4.4
64 2015 04 684·412 2015 04 4·412 4.412 4.412
65 2015 05 564.427 2015 05 4.427 4.427 4.427
66 2015 06 754·442 2015 06 4·442 4.442 4.442
67 2015 07 234.480 2015 07 4.480 4.480 4.48
68 2015 08 354.411 2015 08 4.411 4.411 4.411
69 2015 09 324.504 2015 09 4.504 4.504 4.504
70 2015 10 994·355 2015 10 4·355 4.355 4.355
71 2015 11 884.490 2015 11 4.490 4.490 4.49
72 2015 12 424·413 2015 12 4·413 4.413 4.413
72 2016 1
72 2016 2
72 2016 3
72 2016 4
72 2016 5
72 2016 6
72 2016 7
72 2016 8
72 2016 9
72 2016 10
72 2016 11
72 2016 12